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FAST LAPPED IMAGE TRANSFORMS equivalent with proper phasing, is multiphase filter banks. 

USING LIFTING STEPS Frequency based multi-band transforms have lono found 

P7FT n np thp TWFKrrrnM application in image coding. For instance, the JPEG image 

FIELD OF THE INVENTION compression standard, W. B. Pennebakcr and J. L. Mitchell, 

The current invention relates to the processing of images 5 <<JPEG: Stin Ima S e Compression Standard," Van Nostrand 
such as photographs, drawings, and other two dimensional Keinhold, 1993, employs the 8x8 discrete cosine transform 
displays. It further relates to the processing of such images ( DCT ) at ils transformation stage. At high bit rates, JPEG 
which are captured in digital format or after they have been oftcrs almos! lossless reconstructed image quality. However, 
converted to or expressed in digital format. This invention when raore com P ressio ° * needed, annoying blocking arti- 
further relates to use of novel coding methods to increase the 10 tacts a PP car since thc DCT bases arc short and do not 
speed and compression ratio for digital image storage and overlap, creating discontinuities at block boundaries, 
transmission while avoiding introduction of undesirable Tne wav elet transform, on the other hand, with long, 

artifacts into the reconstructed images. varying-length, and overlapping bases, has elegantly solved 

the blocking problem. However, the transform's computa- 
BACKGROUND OF THE INVENTION 15 tional complexity can be significantly higher than that of the 

In general, image processing is the analysis and manipu- DCL ^ °° m V^y gap is partly in terms of the number 
lation of two-dimensional representations, which can com- of anthmetlcal operations involved, but more importantly, in 
prise photographs, drawings, paintings, blueprints, x-rays of terms of lhe mem ory buffer space required. In particular, 
medical patients, or indeed abstract art or artistic patterns. SOme im P Iementations of the wavelet transform require 
These images are all two-dimensional arrays of information. 20 many more °P eralions P^r output coefficient as well as a 
Until fairly recently, images have comprised almost exclu- large °" ffer - 

sively analog displays of analog information, for example, An intere sting alternative to wavelets is the lapped 

conventional photographs and motion pictures. Even the transform, e.g., H. S. Malvar, Signal Processing with Lapped 
signals encoding television pictures, notwithstanding that Transforms, Artech House, 1992, where pixels from adja- 
the vertical scan comprises a finite number of lines, are 25 cent b ) ocks are utilized in the calculation of transform 
fundamentally analog in nature. coefficients for the working block. The lapped transforms 

Beginning in the early 1960's, images began to be cap- oul P erfor m the DCT on two counts: (i) from the analysis 
tured or converted and stored as two-dimensional digital view P° in ^ ^ey take into account inter-block correlation and 
data, and digital image processing followed. At first images henCC P r0Vlde better ener 8y compaction; (ii) from the syn- 
were recorded or transmitted in analog form and then 30 lhesis view P omt > their overlapping basis functions decay 
converted to digital representation for manipulation on a as y m Ptoucally to zero at the ends, reducing blocking dis- 
counter. Currently digital capture and transmission are on continuities dramatically. 

their way to dominance, in part because of the advent of Nevertheless, lapped transforms have not yet been able to 

charge coupled device (CCD) imaae recording arrays and in . _ su PP lant the unadorned DCT in international standard cod- 
part because of me availability of inexpensive high speed 35 ! ng routines - The principal reason is that the modest 
computers to store and manipulate images. improvement in coding performance available up to now has 

An important task of image processing is the correction or DOt been . ^cient to justify the significant increase in 
enhancement of a particular image. For example, digital com P utatl °nal complexity. In the prior art, therefore, lapped 
enhancement of images of celestial objects taken by space ^ ™ orm ^ ^mained too computationally complex for the 
probes has provided substantial scientific information. beIK f S Xb ^ pr ° Vld u ed In ' P artlcular > the Prions lapped 
However, the current invention relates primarily to com- transf °rmed somewhat reduced but did not eliminate the 
pression for transmission or storage of digital images and an ° 0VlD g poking artifacts. 

not to enhancement. " LS tnere fore an object of the current invention to provide 

One of the problems with digital images is that a complete 45 Ifh I? nrT- ^ " f"^ en ° Ugh ' • r ? laCe 

single image frame can require up to several megabytes of %??]%L£ /.f rnaUonal standard f' *> 

storage space or transmission bandwidth. That is, one of ^1° a " d M P EG - 1,ke °°* n 8 standards. It .s another object 

today's 3J4 inch floppy discs can hold at best a little more fi P T T 8 ' '^ f ? Whkh e haS 

than one gray-scale frame and sometimes substantially less ° V P ?™t baS ' sfuncl ' on u s 50 as t0 avoid block '"g artlfact S- 

than one whole frame. A full-page color picture, for 50 J 1 ^ . a &rthe f°^ct of this mvenUon to provide a lapped 

example, uncompressed, can occupy 30 megabytes of stor- transfo ™. which 15 approximately as fast as, but more 

age space. Storing or transmitting the vast amounts of data f^ al ^ TTT^ ' -P*? ' * ^ 

which would be required for real-time uncompressed high TomveH It £ * • F^f dn ?** c *Z 

resolution digital video is technologically daunting and «"P roved ^ed and efBc.ency ustng a lapped transform with 

virtually impossible for many important communication 55 ^%JT » ? Tf 7 * TT° ^ JW*™*** 

channels, such as the telephone line. The transmission of coeffi ™ nts ; 11 » V« a ***** <*J«* <* '"ventton to 

digital images from space probes can take many hours or ^'f J ^Tl "t^ 1 ^ 3 ne8Ugible 

even days if insufficiently compressed images are involved com P ,e *"y ^ ° ve / *e bare DCT a dramatic coding 

Accordingly, there has been a decades long effort to develop P e f g ain . can «» obtained both from a subjective 

methods of extracting from images the information essential „ If ^ bl0Ckl " 8 artlfeCtS 

to an aesthetically pleasing or scientifically useful picture com P lele| y eliminated. 

without degrading the image quality too much and espe- SUMMARY OF THE INVENTION 

cially without introducing unsightly or confusing artifacts In the current invention, we use a family of lapped 

into the image. biorthogonal transforms implementing a small number of 

The basic approach has usually involved some form of 65 dyadic-rational lifting steps. The resulting transform, called 

coding of picture intensities coupled with quantization. One the LiftLT, not only has high compulation speed but is 

approach is block coding; another approach, mathematically well-suited to implementation via VLSI. 
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Moreover, it also consistently outperforms state-of-the-art FIG. 5 depicts the analysis LiftLT lattice drawn for M=8 

wavelet based coding systems in coding performance when F 1G. 6 depicts the synthesis LiftLT lattice drawn for M=s' 

the same quantizer and entropy coder are used. The LiftLT FTr 7 / . tc ,^ rci . . \ d * WT , T 

is a lapped biorthogonal transform using lifting steps in a J™. \ 6e V lc * a VU51 implementation of the analysis 

modular lattice structure, the result of which is a fast, 5 fi,t ^ ba f °P e ^ ons - 

efficient, and robust encoding system. With only 1 more .5 G ' 8 shows frec l ueuc y aild responses of the 8x16 

multiplication (which can also be implemented with shift- LT: Left: analvsis bank. Ri & ht: synthesis bank, 

and-add operations), 22 more additions, and 4 more delay F * G - 9 P ortravs reconstructed "Barbara" images at 1:32 

elements compared to the bare DCT, the LiftLT offers a fast, compression ratio. 

low-cost approach capable of straightforward VLSI imple- io DESCRIPTION OF THF PRFFFRrfh 

mentation while providing reconstructed images which are ^' " fmrooimfX 

high in quality, both objectively and subjectively. Despite its ,MfiPI 

simplicity, the LiftLT provides a significant improvement in Typically, a block transform for image processing is 

reconstructed image quality over the traditional DCT in that applied to a block (or window) of, for example, 8x8 group 

blocking is completely eliminated while at medium and high 15 °f pixels and the process is iterated over the entire image. A 

compression ratios ringing artifacts are reasonably con- biorthogonal transform in a block coder uses as a decom- 

tained. The performance of the LiftLT surpasses even that of position basis a complete set of basis vectors, similar to an 

the well-known 9/7-tap biorthogonal wavelet transform with orthogonal basis. However, the basis vectors are more 

irrational coefficients. The LiftLT's block-based structure general in that they may not be orthogonal to all other basis 

also provides several other advantages: supporting parallel 20 vectors. The restriction is that there is a "dual" basis to the 

processing mode, facilitating region-of-interest coding and original biorthogonal basis such that every vector in the 

decoding, and processing large images under severe original basis has a "dual" vector in the dual basis to which 

memory constraints. it is orthogonal. The basic idea of combining the concepts of 

Most generally, the current invention is an apparatus for biorthogonality and lapped transforms has already appeared 

block coding of windows of digitally represented images 25 m the P"or art. The most general lattice for M-channel linear 

comprising a chain of lattices of lapped transforms with phase la PP ed biorthogonal transforms is presented in T. D. 

dyadic rational lifting steps. More particularly, this invention Tran ' R ' de Q ueiroz > and T Q- Nguyen, "The generalized 

is a system of electronic devices which codes, stores or lapped biorIho g° nal transform," ICASSP, pp. 1441-1444, 

transmits, and decodes MxM sized blocks of digitally rep- Seattle, May 1998, and in T. D. Tran, R. L. de Queiroz, and 

resented images, where M is an even number. The main 30 T Q* Nguyen, "Linear phase perfect reconstruction filter 

block transform structure comprises a transform having M ban ! c: lattice structure, design, and application in image 

channels numbered 0 through M-l, half of said channel codin g" (submitted to EEE Trans, on Signal Processing, 

numbers being odd and half being even; a normalizer with Apnl 1998 )' A si S nal processing flow diagram of this 

a dyadic rational normalization factor in each of said M well-known generalized filter bank is shown in FIG. 2. 

channels; two lifting steps wiih a firsi sei of identical dyadic 35 !n tilc currc nt invention, which we call the Fast LiftLT, we 

rational coefficients connecting each pair of adjacent num- a PP lv lapped transforms based on using fast lifting steps in 

bered channels in a butterfly configuration, M/2 delay lines an M -channel uniform linear-phase perfect reconstruction 

in the odd numbered channels; two inverse lifting steps with ^ ter bank, according to the generic polyphase representa- 

the first set of dyadic rational coefficients connecting each tion of FIG - In tne lapped biorthogonal approach, the 

pair of adjacent numbered channels in a butterfly configu- 40 P°lvphase matrix E(z) can be factorized as 

ration; and two lifting steps with a second set of identical F( ^_ G , yV , M 

dyadic rational coefficients connecting each pair of adjacent W ■ • ■ WHM 

odd numbered channels; means for transmission or storage where 
of the transform output coefficients; and an inverse trans 



(i) 



(2) 



form comprising M channels numbered 0 through M-l, half 45 \\U- t 0 ir/ 1 ir/ 0 in /i 

of said channel numbers being odd and half being even; two C,W "2[o v\i -/J|o z' y l\i -/] 

inverse lifting steps with dyadic rational coefficients con- A 
necting each pair of adjacent odd numbered channels; two = x VV x A x IV, and 

lifting steps with dyadic rational coefficients connecting 

each pair of adjacent numbered channels in a butterfly 5 0 Eo(*)=—\ U ° u ° j *" 2 ] 
configuration; M/2 delay lines in the even numbered chan- " V2T vVm/ 2 -Vq J' 
nels; two inverse lifting steps with dyadic rational coeffi- 
cients connecting each pair of adjacent numbered channels T *u 

in a butterfly configuration; a denormalizer with a dyadic ™? e( 3 uat L lons > 1 " tne ldentlt y ma *nx, and J is the matrix 

rational inverse normalization factor in each of said M 55 ^th 1 s on the anti-di agonal. 

channels; and a base inverse transform having M channels ^e transform decomposition expressed by equations (1) 

numbered 0 through M-l through (3) is readily represented, as shown in FIG. 2, as a 

complete lattice replacing the "analysis" filter bank E(z) of 

BRIEF DESCRIPTION OF THE DRAWINGS FIG * 1 ™ s decomposition results in a lattice of filters 

mr ! . . . e 60 baving length L=KM. (K is often called the overlapping 

FIG 1 is a polyphase representation of a linear phase factor.) Each cascading structure G,(z) increases the filter 

perfect reconstruction filter bank. length by M . M Uf J 6 ^ ^ * \ ^ ^ JJ.^ 

FIG. 2 shows the most general lattice structure for linear M/2xM/2 invertible matrices. According to a theorem well 

phase lapped transforms with filter length L=KM. known in the art, invertible matrices can be completely 

FIG. 3 shows the parameterization of an invertible matrix 65 represented by their singular value decomposition (SVD), 

via the singular value decomposition. given by 

FIG. 4 portrays the basic butterfly lifting configuration. U^U^U^ v r v^y n 



US 6,421,464 Bl 



5 

where , U fl , V- 0 , V.j are diagonalizing orthogonal 
matrices and F,., A,, are diagonal matrices with positive 
elements. 

It is well known that any M/2xM/2 orthogonal matrix can 
be factorized into M(M-2)/8 plane rotations 8,- and that the 5 
diagonal matrices represent simply scaling factors a,. 
Accordingly, the most general LT lattice consists of KM(M- 
2)/2 two dimensional rotations and 2M diagonal scaling 
factors a f . Any invertible matrix can be expressed as a 
sequence of pairwise plane rotations 0 and scaling factors a £ 
as shown in FIG. 3. 

It is also well known that a plane rotation can be per- 
formed by 3 "shears": 
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attenuation, especially at DC (co=0). Hence, we can com- 
fortably set \S 1 =\ Mn - 

As noted, V 2 is faclorizable into a series of lifting steps 
and diagonal scalings. However, there are several problems: 
(i) the large number of lifting steps is costly in both speed 
and physical real-estate in VLSI implementation; (ii) the 
lifting steps are related; (iii) and it is not immediately 
obvious what choices of rotation angles will result in dyadic 
rational lifting multipliers. In the current invention, we 
approximate V l by (M/2)-l combinations of block -diagonal 
predict-and-update lifting steps, i.e., 




Icosfy -sin# 1 
%w9i cos0, J " 



cosft - 1 
1 



f ' °1 

[ sin0,- 1 J 



COS0; - 1 
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This can be easily verified by computation 

Each of the factors above is capable of a "lifting" step in 
signal processing terminology. The product of two which 
effects a linear transform of pairs of coefficients: 



[;]-('*.'" ;h:j 



The signal processing flow diagram of this operation is 
shown in FIG. 4. The crossing arrangement of these flow 
paths is also referred to as a butterfly configuration. Each of 
the above "shears" can be written as a lifting step. 

Combining the foregoing, the shears referred to can be 
expressed as computationally equivalent "lifting steps" in 
signal processing. In other words, we can replace each 
"rotation" by 3 closely-related lifting steps with butterfly 
structure. It is possible therefore to implement the complete 
LT lattice shown in FIG. 2 by 3KM(-2)/2 lifting steps and 
2M scaling multipliers. 

In the simplest but currently preferred embodiment, to 
minimize the complexity of the transform we choose a small 
overlapping factor K=2 and set the initial stage E 0 to be the 
DCT itself Many other coding transforms can serve for the 
base stage instead of the DCT, and it should be recognized 
that many other embodiments are possible and can be 
implemented by one skilled in the art of signal processing. 

Following the observation in H. S. Malvar, "Lapped 
biorthogonal transforms for transform coding with reduced 
blocking and ringing artifacts," ICASSP97, Munich, April 
1997, we apply a scaling factor to the first DCT's antisym- 
metric basis to generate synthesis LT basis functions whose 
end values decay smoothly to exact zero — a crucial advan- 
tage in blocking artifacts elimination. However, instead of 
scaling the analysis by ^1 and the synthesis by 1/^2, we opt 
for 25/16 and its inverse 16/25 since they allow the imple- 
mentation of both analysis and synthesis banks in integer 
arithmetic. Another value that works almost as well as 25/16 
is 5/4. To summarize, the following choices are made in the 
first stage: the combination of and V 00 with the previous 
butterfly form the DCT; 
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Ao = diagl 



After 2 series of ±1 butterflies W and the delay chain A(z), 
the LT symmetric basis functions already have good 
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Mere, the free parameters u,- and p, can be chosen arbi- 
trarily and independently without affecting perfect recon- 
struction. The inverses are trivially obtained by switching 
the order and the sign of the lifting steps. Unlike popular 
lifting implementations of various wavelets, all of our lifting 
20 steps are of zero-order, namely operating in the same time 
epoch. In other words, we simply use a series of 2x2 upper 
or lower diagonal matrices to parameterize the invertible 
matrix V r 

Most importantly, fast-computable VLSI-friendly trans- 
25 forms are readily available when u, and p, are restricted to 
dyadic rational values, that is, rational fractions having 
(preferably small) powers of 2 denominators. With such 
coefficients, transform operations can for the most part be 
reduced to a small number of shifts and adds. In particular, 
setting all of the approximating lifting step coefficients to 
30 -1/2 yields a very fast and elegant lapped transform. With 
this choice, each lifting step can be implemented using only 
one simple bit shift and one addition. 

The resulting LiftLT lattice structures are presented in 
FIGS. 5 and 6. The analysis filter shown in FIG. 5 comprises 
35 a DCT block 1, 25/16 normalization 2, a delay line 3 on four 
of the eight channels, a butterfly structured set of lifting 
steps 5, and a set of four fast dyadic lifting steps 6. The 
frequency and impulse responses of the 8x16 LiftLT's basis 
functions are depicted in FIG. 8. 
The inverse or synthesis lattice is shown in FIG. 6. This 
40 system comprises a set of four fast dyadic lifting steps 11, a 
butterfly-structured set of lifting steps 12, a delay line 13 on 
four of the eight channels, 16/25 inverse normalization 14, 
and an inverse DCT block 15. FIG. 7 also shows the 
frequency and impulse responses of the synthesis lattice. 

The LiftLT is sufficiently fast for many applications, 
especially in hardware, since most of the incrementally 
added computation comes from the 2 butterflies and the 6 
shift-and-add lifting steps. It is faster than the type-I fast 
LOT described in H. S. Malvar, Signal Processing with 
Lapped Transforms, Artech House, 1992. Besides its low 
50 complexity, the LiftLT possesses many characteristics of a 
high-performance transform in image compression: (i) it has 
high energy compaction due to a high coding gain and a low 
attenuation near DC where most of the image energy is 
concentrated; (ii) its synthesis basis functions also decay 
smoothly to zero, resulting in blocking-free reconstructed 
images. 

Comparisons of complexity and performance between the 
LiftLT and other popular transforms are tabulated in Table 1 
and Table 2. The LiftLT's performance is already very close 
to that of the optimal generalized lapped biorthogonal 
60 transform, while its complexity is the lowest amongst the 
transforms except for the DCT. 

To assess the new method in image coding, we compared 
images coded and decoded with four different transforms: 

DCT: 8-channel, 8-tap filters 
65 Type-I Fast LOT: 8-channel, 16-tap filters 

LiftLT: 8-channcl, 16-tap filters 

Wavelet: 9/7-tap biorthogonal. 



55 



